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Abstract 

Recently we introduced a class of number representations denoted RN-representations, 
allowing an un-biased rounding-to-nearest to take place by a simple truncation. In this paper 
we briefly review the binary fixed-point representation in an encoding which is essentially 
an ordinary 2's complement representation with an appended round-bit. Not only is this 
rounding a constant time operation, so is also sign inversion, both of which are at best 
c*2 . log-time operations on ordinary 2's complement representations. Addition, multiplication 

and division is defined in such a way that rounding information can be carried along in a 
meaningful way, at minimal cost. Based on the fixed-point encoding we here define a floating 
point representation, and describe to some detail a possible implementation of a floating point 
arithmetic unit employing this representation, including also the directed roundings. 

1 Introduction 

o . 

In [KM05J a class of number representations denoted RN-Codings were introduced, "RN" standing 
for "round-to-nearest", as these radix-/3, signed-digit representations have the property that trun- 
cation yields rounding to the nearest representable value. They are based on a generalization of the 
observation that certain radix representations are known to posses this property, e.g., the balanced 
ternary (/3 = 3) system over the digit set { — 1,0, 1}. Another such representation is obtained by 
performing the original Booth-recoding |Boo51] on a 2's complement number into the digit set 
{ — 1, 0, 1}, where it is well-known that the non-zero digits of the recoded number alternate in sign. 
Besides the simplicity of rounding by truncation, it has the feature that the effect of one rounding 
followed by another rounding yields the same result, as would be obtained by a single rounding 
to the same precision as the last. It is a known problem when using any "extended precision" of 
the IEEE-754 standard [I EE0 8], performing computations in "extended-80" format representation 
(e.g., the Intel extended double precision) storing the result in the binary "basic-64" format. 

When we are not concerned with the actual encoding of a value, we shall here use the notation 
RN -representation. This representation was further discussed in KM PI lj. where a special encoding 
of the Booth-recoded binary representation was introduced. This encoding, termed the canonical 
encoding, is based on the ordinary 2's complement representation, but with an appended round- 
bit, allowing round-to- nearest by truncation. Arithmetic on operands in this canonical encoding 
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is essentially standard 2's complement arithmetic, but with the added benefit that negation is a 
constant time operation, obtained by bit inversion. 

To be able to discuss a possible implementation of a floating point arithmetic unit it is necessary 
to repeat some material on the fixed point representation from previous publications. We shall 
in Section 2 (adapted from [KM05] ) cite some definitions, however here restricted to the binary 
representation. Section 3 citing some material from [KMPllj but also expanding it, analyzes the 
relation between RN-representations and 2's complement representations. Conversion from the 
latter into the former is performed by the Booth algorithm, yielding a signed-digit representation 
in a straightforward encoding. It is then realized that n + 1 bits are sufficient, providing a simple 
encoding consisting of the n bits of the 2's complement encoding with a round-bit appended, yield- 
ing the canonical encoding. To be able to define arithmetic operations directly on the canonically 
encoded numbers it turns out to be useful to interpret them as intervals, reflecting the sign of what 
has possibly been rounded away by truncation. 

Section 4 then presents implementations of addition, multiplication and division on fixed-point, 
canonically encoded RN-represented numbers, to some extent repeating material from [KMPllj . 
but also expanding the descriptions on multiplication and division. It is noticed that these imple- 
mentations are essentially identical to standard 2's complement arithmetic. After introducing a 
floating point representation and encodings similar to the IEEE-754 formats, Section 5 shows that 
multiplication and division on such floating point operands can be defined in a straightforward 
way based on the fixed-point algorithms. Addition and subtraction is discussed in some detail, 
split in the traditional "near" and "far" cases. Directed roundings are are shown realizable based 
on a "sticky-bit", but without the need for a rounding incrementation. Section 6 finally concludes 
the paper. 



2 Binary RN-representations 

For an introduction to the general class of RN-representations for odd and even radix we refer the 
reader to [KM05j . Here we are concentrating on the case of radix 2 over the digit-set {— 1,0, 1}, 
with the restriction that the signs of non-zero digits alternate. 

Definition 1 (Binary RN-representation) 

The digit sequence D = d n d n -\d n -2 ■ ■ • (with —l<di<l)isa binary RN-representation of x iff 

1. x — Y^i=-oodiP l (that is D is a binary representation of x); 

2. for any j < n, 
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< -2 j , 
~ 2 ' 



that is, if the digit sequence is truncated to the right at any position j , the remaining se- 
quence is always the number (or one of the two members in case of a tie) of the form 
d n d n _id n _2d n -3 ■ ■ -dj that is closest to x. 

Hence, truncating the RN-representation of a number at any position is equivalent to rounding 
it to the nearest. Although it is possible to deal with infinite representations, we shall restrict our 
discussions to finite representations, and find for such RN-representations some observations: 
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Theorem 2 (Binary RN-representations) 

D = d m d m _i ■ ■ - de is a binary RN -representation iff 

1. all digits have absolute value less than or equal 1; 

2. if \di\ = 1, then the first non-zero digit that follows on the right has the opposite sign, that 
is, the largest j < i such that dj ^ satisfies d( x dj < 0, 

with some numbers having two finite representations, where one has its least significant nonzero 
digit equal to 1, the other one has its least significant nonzero digit equal to —1. 

A number whose finite representation has its last nonzero digit equal to 1 has an alternative 
representation ending with —1. Just assume the last two digits e.g. are dl: since the representation 
is an RN-representation, if we replace these two digits by (d + 1)(— 1) we still have a valid RN- 
representation. This has an interesting consequence: truncating a number which is a tie will round 
either way, depending on which of the two possible representations the number happens to have. 
Note that when a rounding has taken place, the sign of a non-zero part rounded away will have 
the opposite sign of the last non-zero digit, thus the representation carries information about what 
was rounded away, thus effectively halving the error bound on the result. We shall see below that 
this information may be utilized in subsequent calculations. 

This rounding rule is thus different from the "round-to-nearest-even" rule required by the IEEE 
floating point standard [IEE08] . Both roundings provide a "round-to-nearest" in the case of a tie, 
but employ different rules when choosing which way to round in this situation. But note that 
the direction of rounding in general depends on how the value to be rounded was derived, as 
the representation of the value in the tie situation is determined by the sequence of operations 
leading to the value. However, when employing the canonical encoding based on a 2's complement 
encoding, and the implementation of the basic arithmetic operations later, then the rounding in 
the tie-situations is deterministic. 

3 Encoding Binary RN-represented Numbers 

Here we briefly cite for completeness from [K MPllj the definition and some properties of the 
canonical binary representation and its encoding. Consider a value x = —b m 2 m + Y^t 1 b%1 1 in 
2's complement representation: 

x ~ b m b m _i ■ ■ ■bi+xbi 
with bi G {0, 1} and m > I. Then the digit string 

<W>m-l • - • h+\h with SiG {-1,0,1} 

defined (by the Booth recoding |Boo51j ) for i — £,■■-, m as 

$i — h-i ~ bi (with bi_i = by convention) (1) 

is an RN-representation of x with <5j G { — 1, 0, 1}. That it represents the same value follows trivially 
by observing that the converted string represents the value 2x — x. The alternation of the signs 
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of non-zero digits is easily seen by considering how strings of the form Oil ■ ■ ■ 10 and 100 ■ • ■ 01 are 
converted. 

Hence the digits of the 2's complement representation directly provides an encoding of the 
converted digits as a tuple: ~ (h-x, fej) for i — £,■■■, m where 

-1 ~ (0,1) 

0~ (0,0) or (1,1) (2) 
1 ~ (1,0), 

where the value of the digit is the difference between the first and the second component. 

Example: Let x = 110100110010 be a sign-extended 2's complement number and write the digits 
of 2x above the digits of x: 



2x 


1 





1 








1 


1 








1 








X 


1 


1 





1 








1 


1 








1 





RN-repr. x 




1 


1 


1 





1 





1 





1 


1 






where it is seen that in any column the two upper-most bits provide the encoding defined above 
of the signed-digit below in the column. Since the digit in position m+1 will always be 0, there is 
no need to include the most significant position otherwise found in the two top rows. □ 

If x is non-zero and is the least significant non-zero bit of the 2's complement representation 
of x, then 5k = — 1, confirmed in the example, hence the last non-zero digit is always —1 and thus 
unique. However, if an RN-represented number is truncated for rounding somewhere, the resulting 
representation may have its last non-zero digit of value 1. 

As mentioned in Theorem [2] there are exactly two finite binary RN-representations of any non- 
zero binary number of the form a2 k for integral a and k, hence requiring a specific sign of the last 
non-zero digit would make the representation unique. On the other hand without this requirement, 
rounding by truncation of the 2's complement encoding also makes the rounding deterministic and 
furthermore unbiased in the tie-situation, by rounding up or down, depending on the sign of the 
digit rounded away. 

Example: Rounding the value of x in the previous example by truncating off the two least 
significant digits we obtain 



tr 2 (2x) 


1 





1 








1 


1 








1 


tr 2 (x) 


1 


1 





1 








1 


1 








RN-repr. RN 2 (x) 




1 


1 


1 





1 





1 





1 



where it is noted that the bit of value 1 in the upper rightmost corner (in boldface) acts as a round 
bit, carrying information about the part rounded away by truncation (which here happened to be 
a tie situation). □ 

The example shows that there is very compact encoding of RN-represented numbers derived 
directly from the 2's complement representation, noting in the example that the upper row need 
not be part of the encoding, except for the round-bit. We will denote it the canonical encoding, 
and note that it is a kind of "carry-save" in the sense that it contains a bit not yet added in. 
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Definition 3 (Binary canonical RN-encoding) 

Let a number x be given in 2's complement representation as the bit string b m ■ ■ -bf+ibi, such that 
% — —b m 2 m + Y^i=i^i2 l . Then the binary canonical encoding of the RN -representation of x is 
defined as the pair 

x ~ (b m b m -i ■ ■ ■ be + ibe, r) where the round-bit is r = 
and after truncation at position k, for m > k > £ 

RN fc (x) ~ (b m b m -i ■ ■ ■ b k+ ib k , r) with round-bit r = b k -i- 

The signed-digit interpretation is available with digits 8^ from the canonical encoding by pairing 
bits using the encoding (J2J), bj) as Si = — hi for i > k, and (r, b k ) with 5 k = r — b k , when 
truncated at position k. 

The fundamental idea of the canonical radix-2 RN-encoding is that it is a binary encoding of 
a value represented in the signed digit set {—1,0, 1}, where the non-zero digits alternate in sign. 
Using this encoding of such numbers employing 2's complement representation in the form (a, r a ) 
it is seen that it then represents the value 

(2a + r a u) — a = a + r a u, 

where u is the weight of the least significant position of a. Note that there is then no difference 
between (a, 1) and (a + u, 0), both being RN-representations of the same value: 

Va,V(a, 1) =V{a + u, 0), 

where we use the notation V(x, r x ) to denote the value of an RN-represented number. 

If (x, r x ) is the binary canonical encoding of X = V(x, r x ) = x + r x u then it follows that 

—X = —x — r x u = x + u — r x u — x + (1 — r x )u = x + r x u, 

which can also be seen directly from the encoding of the negated signed digit representation. 

Observation 4 If (x,r x ) is the canonical RN-encoding of a value X, then (x,f x ) is the canonical 
RN-encoding of —X , where x is the l's complement of x. Hence negation of a canonically encoded 
value is a constant time operation. 

It is important to note that although from a "value perspective" the representation is redundant 
(V(a, 1) = V(a + u, 0)), it is not so when considering the signed-digit representation. In this 
interpretation the sign of the least significant digit carries information about the sign of the part 
which possibly has been rounded away. 

Lemma 5 Provided that a RN-represented number with canonical encoding (a, r a ) is non-zero, 
then r a = 1 implies that the least significant non-zero digit in its signed-digit representation is 1 
(the number was possibly rounded up), and r a = implies it is —1 (the number was possibly rounded 
down). 

Proof: The result is easily seen when listing the trailing bits of the 2's complement representation 
of 2a + r a (with r a in boldface) above those of a together with the signed-digit representation: 

... 1 1 ... 1 ...100...0 
... . 1 ... 1 ... . 1 ... 
... . 1 ... ... . 1 ... 
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3.1 The Range of p-hit Canonically Encoded Numbers 



With p + 1 bits in the tuple (a,r a ), where a is a p-bit 2's complement representation paired 
with the round-bit r a , it is just a non-redundant encoding of a value represented by p digits 
over the digit set { — 1,0,1}. Assuming that the radix point is at the rightmost end, i.e., the 
numbers represent integer values, then the maximal value representable in p digits in {—1, 0, 1} is 
10 • • • ~ 2 P ~ 1 and the minimal value is 10 • • • ~ — 2 P_1 , hence the range is sign-symmetric. The 
corresponding canonical encodings employing 2's complement are (Oil • • • 1, 1) ~ (2 p ~ l — 1, 1) and 
(100 • • • 0, 0) ~ (-2P- 1 , 0) respectively. 

3.2 An Alternative Interpretation 

But we may also interpret the representation (a,r a ) as an interval l(a,r a ) of length u/2: 

X(a, r a ) = [a + r a f ; a + (1 + r a ) f ] , (3) 

when interpreting it as an interval according to what may have been thrown away when rounding 
by truncation. For a detailed discussion of this interpretation see [KMPllj . 

Hence even though (a, 1) and (a + u, 0) represent the same value a + u, as intervals they are 
essentially disjoint, except for sharing a single point. In general we may express the interval 
interpretation as pictured in Fig. [T] 

X(a, 0) X(a, 1) X(a + u, 0) X(a + u, 1) 

4 1 ¥ 



t \ u t \ 3u t 

a a + | a + u a + 4r a + 2u 

Figure 1: Binary Canonical RN- representations as Intervals 

We do not intend to define an interval arithmetic, but only require that the interval represen- 
tation of the result of an arithmetic operation satisfied 

X(A QB)C 1(A) 1(B) = {aQb\ae A,b e B}. (4) 



4 Arithmetic Operations on RN- Represented Values 

We will here briefly summarize from [KMPllj the realization of addition and multiplication on 
fixed-point representations for fixed value of u, but expand on the implementation of multiplication 
and division. We want to operate directly on the components of the encoding (a,r a ), not on the 
signed-digit representation. 

1 Note that this is the reverse inclusion of that required for ordinary interval arithmetic 
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4.1 Addition of RN-Represented Values 

Employing the value interpretation of encoded operands (a, r a ) and (b, r b ) we have for addition: 

V(a,r ) = a + r a w 

+V(b,r b ) = b + r b u 

V(a,r a ) + V(b,n) = a + b+(r a + r b )u 

The resulting value has two possible representations, depending on the choice of the rounding 
bit of the result. To determine what the rounding bit of the result should be, we consider interval 
interpretations ([3]) of the two possible representations of the result. 

To define the addition operator © on canonical encodings, we want X((a,r a ) © (b, r b )) C 
X(a,r a ) + X(b,r b ), and (a,r a ) © (0,0) = (a,r a ), hence in order to keep addition symmetric, we 
define addition of RN encoded numbers as follows. 

Definition 6 (Addition) If u is the unit in the last place of the operands, let: 

(a, r a ) © (b, r h ) = ((a + b + (r a A r b )u),r a V r b ) 

where r a A r b may be used as carry-in to the 2 7 s complement addition. 

Recalling that —(x,r x ) = (x,f x ), we observe that using Definition E] for subtraction yields 
(x,r x ) © (x,r x ) = (— u, 1), with V( — u, 1) = 0. It is possible alternatively to define addition on 
RN-encoded numbers ) ©2 (b, r&) = ((a + b + (r a V r b )u),r a A r b ). Using this definition, 

(x,r x ) ©2 (x,r x ) = (0,0), but then the neutral element for addition is (— u, 1), i.e., (x,r x ) ©2 
(-«, 1) = (x,^). 

4.2 Multiplying RN-Represented Values 

By definition we have for the value of the product 

V(a,r a ) = a + r a u 

V(b, r b ) = b + r b u 

V(a,r a )V(b, r b ) = ab + (ar b + br a )u + r a r b u 2 , 

noting that the unit of the result is u 2 , assuming that u < 1. Using the interval interpretation 
it turns out (for details see |KMPll| ) that we do not get proper interval inclusions for all sign 
combinations. However, since negation of canonical (2's complement) RN-encoded values can 
be obtained by constant-time bit inversion, multiplication of such operands can be realized by 
multiplication of the absolute values of the operands, the result being supplied with the correct 
sign by a conditional inversion. Thus employing bit- wise inversions, multiplication in canonical RN- 
encoding may be handled like sign-magnitude multiplication, hence assuming that both operands 
are non-negative: 

Definition 7 (Multiplication) If u is the unit in the last place, with u < 1, we define for non- 
negative operands: 

(a, r a ) © (6, r b ) = (ab + u(ar b + br a ),r a A r b ) , 

and for general operands by appropriate sign inversions of the operands and result. Ifu < 1 the unit 
is u 2 < u and the result may often have to be rounded to unit u, which can be done by truncation. 
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The product can be returned as (p, r p ) with p = ab + u(ar b + br a ) = a{b + r b u) + br a u, where 
the terms of br a u may be consolidated into the array of partial products. 

Example: For a 5-bit integer example let (a,r a ) = (0403020100, r a ) and (6, r b ) = (0463&2&1&0, r &), 
or in signed-digit b = d4d 3 d 2 did , di G {—1,0,1}, we note that a 4 = 64 = since a > and 
b > 0. It is then possible to consolidate the terms of br a u (shown framed) into the array of partial 
products!! 











a 3 


a 2 


di 


a 












a 3 do 


a 2 do 


dido 


aodo 


d 








a 3 d\ 


a 2 d\ 


a\d\ 


CLQd\ 


b r a 


d\ 






a 3 d 2 


a 2 d 2 


a\d 2 




b\r a 




d 2 




a 3 d 3 


a 2 d 3 


a\d 3 


a d 3 


b 2 r a 






d 3 


03^4 


d 2 d^ 


d\d^ 


dod^ 


b 3 r a 








C?4 


P8 P7 


P& 


P5 


P4 


P3 


P2 


Pi 


Po 





thus the product is (p, r p ) with p = ab + u(ar b + br a ) = a{b + r b u) + br a u and r p = r a r b . 
In particular for a = (01011, 1) and b = (01001, 1) ~ 11010: 







1 





1 


1 

























1 





1 




m 


1 














GO 







-1 


-1 


-1 








-1 


1 1 


1 


m 








1 


11 


1 





1 


1 


1 





hence (01011, 1) ® (01001, 1) = (001110111, 1), where we note that (001110111, 1) corresponds to 
the interval [01110111.1 ; 01111000.0], clearly a subset of the interval 

[01011.1x01001.1 ; 01100 x 01010] 
= [01101101.01 ; 01111000.00]. 

Thus multiplication of RN-represented values can be implemented on their canonical encodings 
at about the same cost as ordinary 2's complement multiplication. Note that when recoding the 
multiplier into a higher radix like 4 and 8, similar kinds of consolidation may be applied. 

4.3 Dividing RN-Represented Values 

As for multiplication we assume that negative operands have been sign-inverted, and that the signs 
are treated separately. Employing our interval interpretation ([2]), to satisfy our interval inclusion 
condition (TJJ we must require the result of dividing (x,r x ) by (y,r y ) to be in the interval: 

x + r x l x + (1 + r x )f 
_y + {l + r y )l 5 y + r y l ' 

2 not showing the possible rewriting of negative partial products, requiring an additional row. 
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where it is easily seen that the rational valudfl 



q 



x + r x % 



belongs to that interval. Note that the dividend and divisor to obtain the quotient q are then 
constructed simply by appending the round bits to the 2's complement parts, i.e., simply using the 
"extended" bit-strings as operands. To determine the accuracy needed in an approximate quotient 
q' = q + e consider the requirement 

+ — — — -5 ■ (5) 



y + (l + r y )% y + r ^ 

Generally division algorithms require that the operands are scaled, hence assume that the 
operands satisfy 1 < x < 2 and 1 < y < 2, implying ~ < q < 2. Furthermore assume that x and 
y have p fractional digits, sou = 2~ p . To find sufficient bounds on the error e in (jSJ) consider first 
for e > the right bound. Here we must require 

{x + r x l)+e{y + r y \) <x+{l + r x )l or e (y + r y ±)<±, 

which is satisfied for e < |, since y + r y ~ < 2. For the other bound (for negative e) we must 
require 

X "T T x ~2 X ~\~ T x 2 



< — fr + £ 



y + {l + r y )l y + r y l 



or 



-e(v + 0- + r v )%) < (x + r x %)2 

which is satisfied for — e < |, since x > 1 and y + u < 2. 

Hence |e| < f assures that (jSJ) is satisfied, and any standard division algorithm may be used 
to develop a binary approximation to q with p + 2 fractional bits, a; = g'y + r with |r| < y2~ p ~ 2 . 
Note that since g may be less than 1, a left shift may be required to deliver ap+1 signed-digit 
result, the same number of digits as in the operands. Hence a bit of weight 2~ p_1 will be available 
(preliminary) round bit. 

The sign of the remainder determines the sign of the tail beyond the bits determined. Recall 
from Lemma [5] that when the round bit is 1, the error is assumed non-positive, and non- negative 
when the round bit is 0. If this is not the case then the resulting round bit must be inverted, hence 
rounding is also here a constant time operation. 

Similarly, function evaluations like squaring, square root and even the evaluation of "well 
behaved" transcendental functions may be defined and implemented, just considering canonical 
RN-represented operands as 2's complement values with a "carry-in" not yet absorbed, possibly 
using interval interpretation to define the resulting round bit. 



3 We could also have chosen to evaluate the quotient ^^ x " . However dividing (x,r x ) by the neutral element 
(1,0) would then yield the result (x + r x u,0), whereas with the chosen quotient the result becomes (x,r x ). The 
relative difference between these two expressions evaluated to some precision p is at most ulp(p)/2. 
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5 A Floating Point Representation 



For an implementation of a binary floating point arithmetic unit (FPU) it is necessary to define 
an encoding of an operand (2 e m, r m ) , based on the canonical encoding of the significand part (say 
m encoded in p + 1 bits, 2's complement), supplied with the round bit r m and the exponent e in 
some biased binary encoding. It is then natural to pack the components into a computer word (32, 
64 or 128 bits), employing the same principles as used in the IEEE-754 standard [IEE08J (slightly 
modified from what was sketched in IK MP 1 lj ). For normal values it is also here possible to use 
a "hidden bit", noting that the 2's complement encoding of the normalized significand will have 
complementary first and second bits. Thus representing the leading bit as a (separate) sign bit, 
the next bit (of weight 2°) need not be represented and can be used as "hidden-bit". Hence let 
f m be the fractional part of the significand m, assumed to be normalized such that 1 < \m\ < 2, 
and let s m be the sign-bit of m. The "hidden bit" is then the complement s m of the sign-bit. The 
components can then be allocated in the fields of a word as: 

Pnj e I f m \r m \ 

with the round bit in immediate continuation of the significand part. The exponent e can be 
represented in biased form as in the IEEE-754 standard. The number of bits allocated to the 
individual fields may be chosen as in the different IEEE-754 formats, of course with the combined 
fmi r m together occupying the fraction field of those formats. The value of a floating point number 
encoded this way can then be expressed as: 

r- bias ([s m rs m 

•/1/2 • • • fp — l\2c ) 

where fi, fi-, • • • , f P -i are the (fractional) bits of f m . 

Subnormal and exceptional values may be encoded as in the IEEE-754 standard, noting that 
negative subnormals have leading ones. Observe that the representation is sign-symmetric and 
that negation is obtained by inverting the bits of the significand. 

We shall now discuss how the fundamental operations may be implemented on such floating 
point RN-representations, not going into details on overflow, underflow and exceptional values, as 
these situations can be treated exactly as known for the standard binary IEEE-754 representation. 
Note again that we want directly to operate on the 2's complement encoding using s m ,f m ,r m , 
not on the signed-digit representation. Right-shifts are trivial, but for alignment of operands or 
left normalizing results, we must investigate how in general we can perform left shifts using the 
2's complement encoding. 

Thinking of the value as represented in binary signed-digit, zeroes have to be shifted in when 
left shifting. In our encoding, say for a positive result (d, rj) we may have a 2's complement bit 
pattern: 

d ~ ••• 1 bk • ■ • bp-i and round bit 

to be left-shifted. Here the least significant signed digit is encoded as {& r<l 1 }- Zero- valued digits 

to be shifted in may then be encoded as { ^}, as confirmed from applying the addition rule for 
obtaining 2 x (x, r x ) by (x, r x ) © (x, r x ) = (2x + r x u, r x ). 

It then follows that shifting in bits of value will precisely achieve the effect of shifting in 
zeroes in the signed-digit interpretation: 

2 k d ~ 1 bk ■ ■ ■ bp-iTd ■ ■ • r,i with round bit r^. 
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5.1 Multiplication and Division 



Since the exponents are handled separately, forming the product or quotient of the significands is 
precisely as described previously for fixed point representations: sign- inverting negative operands 
by bitwise inversion, forming the product or quotient, possibly normalizing and rounding it, and 
supplying it with the proper sign by negating the result if the operands were of different signs. 



5.2 Addition 

Before addition or subtraction there is in general a need of alignment of the two operand signifi- 
cands, according to the difference of their exponents (too large a difference is treated as a special 
case, see below). The operand having the larger exponent must be left-shifted, with appropriate 
digit values appended at the least significant end, to overlap with the significand of the smaller 
operand. In effective subtractions, after cancellation of leading digits it may be necessary to left- 
normalize. Addition is traditionally now handled in an FPU as two cases |Far81j . where the "near 
case" is dealing with effective subtraction of operands whose exponents differ by no more than 
one, where alignment is a constant time operation. 



5.2.1 Subtraction, the "near case" 



Here a significant cancellation of leading digits may occur, and thus a variable amount of normal- 
ization shifts on the result are required, handled by shifting in copies of the round-bit. Figure [2] 
shows a possible pipelined implementation of this case, where lzd(d) is a log-time algorithm for 
"leading zeroes determination" of the difference (see e.g., [Kor09]) to determine the necessary 
normalization shift amount. This determination is based on a redundant representation of the 
difference (obtained in constant time by pairing the aligned operands), taking place in parallel 
with the 2's complement subtraction (conversion from redundant to non-redundant representa- 
tion). Normalization can then take place on the non- redundant difference without need for sign 
inversion. 



m a m b 
±± 



e a e b 
lilt 



lzd(d) ! Subtract 



jForm Exponentl 



Normalize 











AH] 


Exponent 



Figure 2: Near Path, effective subtraction when \e a — e&| < 1 



For simplicity in the figure we assume that m a ,mb and m r are the 2's complement operands, 
respectively the result, together with their appended round-bits. 
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5.2.2 Addition, the "far case" 



The remaining cases dealt with are the situations where the result of adding or subtracting the 
aligned significands at most requires normalization by a single right or left shift. Since negation 
is a constant time operation we may assume that an effective 2's complement addition is to be 
performed of the left-aligned larger operand (appended with copies of the round-bit) and the sign- 
exended smaller operand. Rounding can then be performed as usual by truncation, noting that here 
there are only two log-time operations, the variable amount of alignment shifts and the addition. 
Compared with the IEEE- 754 standard number representation the "expensive" determination of 
a sticky bit and rounding incrementation is avoided. Figure E] shows a possible two-stage pipeline 
implementation. 

m a m,b e„ e& 



1 



Align operands 



jForm exponent | 



Add/Sub and Round 



Adj. exponent 



m 



Figure 3: Far Path, add or subtract when \e a — e&| > 2 



In the case where the exponent difference exceeds the number of operand bits, it is not necessary 
to form the exact sum. The result can be constructed from the 2's complement significand of the 
larger operand, supplied with a round-bit obtained by a very simple rule providing a result obeying 
the interval inclusion condition (J3J). Assuming that the smaller operand is to be added, simply 
force the round-bit of the result to become equal to the complemented sign-bit of the smaller 
operand (and of course in case of subtraction the sign-bit). 

To see this, consider the interval interpretation (|3J of the significand of the larger operand 



X (a, r a ) together with a bounding interval for the smaller operand 



■ - 

u , 2 



(assumed positive), 



where u = ulp(a) is the unit of the least-significant position of the larger operand: 



X(a,r a ) + 



■ - 

w , 2 



a + r a f ;a+(l + r a )| 
a + r a f ;a + (2 + r a )f 
[a ; a + u] 



a + f ; a + ~u 



+ 



■ - 

u , 2 



for r a = 
for r„ = 1. 



a + 2 ; a + u 



it satisfies the interval condition 



Hence chosing the result as X (r, r a ) = X (a, 1) 
when the smaller operand is positive. Similarly, if the smaller operand is negative 



X(a,r ) + 



--■ 

2 , u 



- f ; a + f J for r a = 
[a ; a + u] for r a — 1 



hence the result can be chosen as X (r, r a ) = X (a, 0) 



a ; a + f 



In summary we have: 
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Lemma 8 Given RN -represented floating point operands A = (s , e a , f a , r a ) and B = (sj, e&, /&, r&) 
iotYA p-frz't significands satisfying e a > e& + p, i/ien £/ie result of the addition S = A + B can be 
represented as S = (s a , e a , f a , s b ). 

5.3 Discussion of the Floating Point RN-representation 

As seen above it is straightforward to define binary floating point representations, when the signif- 
icand is encoded in the canonical 2's complement encoding with the round-bit appended. An FPU 
implementation of the basic arithmetic operations is feasible in about the same complexity as one 
based on the sign- magnitude representation of the IEEE-754 standard for binary floating point. 
But since the round-to-nearest functionality is achieved at less hardware complexity, the arithmetic 
operations will generally be faster, by avoiding the usual log-time "sticky-bit" determination and 
rounding incrementation. 

Negation is obtained in constant time by bit-wise inversion, noting that the domain of repre- 
sentable values is sign symmetric. Although one less bit is used for the significand, the round-bit 
provides additional information such that the discretization error is the same as in the IEEE-754 
representation of the compatible format. Notice that if the result is not exact, then the round-bit 
provides information about which direction the rounding took. In effect the round-bit provides 
the same information on the accuracy of the result as the additional bit available in the binary 
IEEE-754 encoding of the significand. Just as in an x86 FPU it is possible to signal exactness of 
a result. 

The directed roundings can also be realized at minimal cost; however, requiring the calculation 
of a "sticky bit", as also needed for the directed roundings of the IEEE-754 representation, but no 
rounding incrementation is needed here: 

Theorem 9 Let a number after truncation of tail t have encoding (a,r a ) with sign-bit s a , then the 
directed roundings can be realized by changing the resulting round-bit as follows: 

RU: r«:=l 
RD : r a := 
RZ : r a := s a 
RA : r a := s a , 

conditional on the truncated tailt (the "sticky-bit") being non-zero. 

Proof: Consider the case of RU when r a = 0. By Lemma E] the least significant non-zero signed- 
digit of the truncated (a,r a ) is —1, and if t ^ the value was effectively rounded down, thus r a 
should be changed to r = 1, whereas it should not be changed when r a — 1. The other cases 
follow similarly. □ 

6 Conclusions and Discussion 

Concentrating on binary RN-represented operands over the signed digit set { — 1,0,1}, allowing 
trivial (constant time) rounding by truncation, we have previously proposed a simple encoding 
based on the ordinary 2's complement representation, with negation also being a constant time 
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operation, which often simplifies the implementation of arithmetic algorithms. Operands in the 
canonical encoding can be used directly at hardly any penalty in the implementation of the basic 
arithmetic operations, e.g., addition, subtraction, multiplication and division, allowing constant 
time rounding. Thus despite the RN-representation encodes a very special signed-digit represen- 
tation, it allows the operations to be performed in a slightly modified 2's complement arithmetic. 

The fixed point encoding immediately allows for the definition of corresponding floating point 
representations, which in a comparable hardware FPU implementation will be simpler and faster 
than an equivalent IEEE-754 standard conforming implementation. 

The particular feature that rounding-to-nearest is obtained by truncation, implies that repeated 
roundings ending in some lower precision yields the same result, as if a single rounding to that 
precision was performed. In |Lee89] it was proposed to attach some state information (2 bits) to a 
rounded result, allowing subsequent roundings to be performed in such a way that these problems 
are avoided. It was shown that this property holds for any specific IEEE-754 rounding mode, 
including in particular for the round-to-nearest-even mode. But the IEEE-754 roundings may still 
require log-time incrementations, which are avoided with the proposed RN-representation. 

Thus in applications where conformance to the IEEE-754 standard is not required, employing 
the proposed floating-point RN-representation, it is possible to avoid the penalty of log-time round- 
ings. Signal processing may be an application area where specialized hardware (ASIC or FPGA) 
is often used anyway, where the RN-representation can provide faster arithmetic with un-biased 
round-to-nearest operations at reduced area and delay. 
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